Coding Code: Investigating Student’s Data Science Skills with Qualitative Methods
Today’s layout
Slides: bit.ly/uscots-coding-code
Investigating student learning through code
What research has been done?
A great deal of research has focused on what to teach in data science courses, but little focus on how students learn data science concepts.
Thus far we have detailed…
concepts or competencies that ought to be included in data science programs
perspectives on when to teach data science
how to teach data science concepts
methods for integrating data science into the classroom
assorted topics to be considered in data science courses
Drawing on research in Computer Science Education
The Importance of Students’ Attention to Program State (Lewis 2012)
Attends to both the code produced by a student and their learning process
Pairs a student’s code with their debugging behavior side-by-side
These analyses of students’ code should not be few and far between. Students’ code poses a unique avenue for qualitative research in the teaching and learning of computing.
A framework for analyzing student’s code (Schulte 2008)
| Text Surface | Program Execution | Function | |
|---|---|---|---|
| Macrostructure | Understanding the overall structure of the program | Understanding the “algorithm” of the program | Understanding the goal / purpose of the program (in its context) |
| Relations | References between blocks, e.g., method calls, object creation | Sequence of method calls, object sequence diagrams | Understanding how sub-goals are related to goals, how function is achieved by subfunctions |
| Blocks | Regions of interest (ROI) that syntactically or semantically build a unit | Operation of a block, a method, or a ROI (as a sequence of statements) | Function of a block, may be seen as a sub-goal |
| Atoms | Language elements | Operation of a statement | Function of a statement, only understandable in context |
Atom
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
Text Surface
How is whitespace being used?
Program Execution
What operation(s) does this statement carry out?
Function
How is this statement related to the broader context of the program?
Block
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
Program Execution
What operation(s) does this block carry out?
Function
How is this block related to the broader context of the program?
Relationships Between Blocks
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
summary(posterior2)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))
abline(posterior2)
plot(posterior2)
posterior2
How can this be used for learning trajectory research?
Atom-level Analysis
“How does a student’s use of code comments (to structure their analysis) change over time?”
Block-level Analysis
“How does a student’s data analysis process change over time?”
Some tools to guide you
“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using
$operator”
Process coding
uses gerunds (“-ing” words) to connote action in the data (Saldana 2013)
“Fitting a linear regression, inspecting regression summary, plotting scatterplot of variables in regression, adding a regression line to the plot, visualizing model diagnostics”
Let’s give it a try!
Why is this important for data science education?
How can we distinguish merely interesting learning from effective learning (Wiggins and McTighe 2005)?
Questions?
Practical considerations
How much code should I collect?
How do readers trust my analysis?
Excellent resources: Creswell & Poth (2018); Merriam & Tisdell (2016); Miles et al. (2020)